Longest - match String Searching for Ziv – Lempel Compression timothy

نویسندگان

  • Timothy Bell
  • David Kulp
چکیده

SUMMARY Ziv–Lempel coding is currently one of the more practical data compression schemes. It operates by replacing a substring of a text with a pointer to its longest previous occurrence in the input, for each coding step. Decoding a compressed file is very fast, but encoding involves searching at each coding step to find the longest match for the next few characters. This paper presents eight data structures that can be used to accelerate the searching, including adaptations of four methods normally used for exact matching searching. The algorithms are evaluated analytically and empirically, indicating the trade-offs available between compression speed and memory consumption. Two of the algorithms are well-known methods of finding the longest match—the time-consuming linear search, and the storage-intensive trie (digital search tree). The trie is adapted along the lines of a PATRICIA tree to operate economically. Hashing, binary search trees, splay trees and the Boyer–Moore searching algorithm are traditionally used to search for exact matches, but we show how these can be adapted to find longest matches. In addition, two data structures specifically designed for the application are presented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Longest-match String Searching for Ziv-Lempel Compression

Ziv-Lempel coding is currently one of the more practical data compression schemes. It operates by replacing a substring of a text with a pointer to its longest previous occurrence in the input, for each coding step. Decoding a compressed le is very fast, but encoding involves searching at each coding step to nd the longest match for the next few characters. This paper presents eight data struct...

متن کامل

Fast Multi-Match Lempel-Ziv

One of the most popular encoder in the literature is the LZ78, which was proposed by Ziv and Lempel in 1978. After the original paper was published, many variants of the LZ78 were proposed to improve its performance. Simulation results have shown that the well known LZW version has an improvement around 10%. Given a sequence u ∈ A, where A is the source alphabet, all versions of LempelZiv parse...

متن کامل

A General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text

We address in this paper the problem of string matching on Lempel-Ziv compressed text. The goal is to search a pattern in a text without uncompressing. This is a highly relevant issue, since it is essential to have compressed text databases where eecient searching is still possible. We develop a general technique for string matching when the text comes as a sequence of blocks. This abstracts th...

متن کامل

Fast Relative Lempel-Ziv Self-index for Similar Sequences

Recent advances in biotechnology and web technology are generating huge collections of similar strings. People now face the problem of storing them compactly while supporting fast pattern searching. One compression scheme called relative Lempel-Ziv compression uses textual substitutions from a reference text as follows: Given a (large) set S of strings, represent each string in S as a concatena...

متن کامل

Lempel-Ziv factorization: Simple, fast, practical

For decades the Lempel-Ziv (LZ77) factorization has been a cornerstone of data compression and string processing algorithms, and uses for it are still being uncovered. For example, LZ77 is central to several recent text indexing data structures designed to search highly repetitive collections. However, in many applications computation of the factorization remains a bottleneck in practice. In th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993